Introduction: Multiple myeloma (MM), a malignancy of plasma cells, poses diagnostic challenges due to its complex manifestations and variable outcomes. Accurate and rapid assessment of M-spike values is crucial for monitoring disease management . The gold standard for monitoring MM treatment response is serum and urine protein electrophoresis, which quantifies M-spike proteins; however, the turnaround time for results is 3-7 days, delaying treatment decisions. We hypothesized that machine learning (ML) could integrate structured electronic clinical and laboratory data (EHR) to rapidly and accurately predict patient M-spike values. In our initial study, we conducted a retrospective chart review of 171 MM patients from a single institute (E.Malek, et al. JCO Clinical Cancer Informatics 7 (2023): e2300078). A Random Forest model incorporating 43 independent variables from EHR was utilized to predict M-spike values. The dataset was divided into training and test sets, and the model's performance was evaluated using Pearson and Spearman correlation coefficients. The model demonstrated a strong predictive capability, with high correlation coefficients, and feature selection indicated that the first and second lagged M-spike and serum total protein were the three most significant predictors. Based on the findings from our original research, which developed a Random Forest ML model for predicting M-spike values using same-day clinical and laboratory data, we conducted an external validation of the model using the HealthTree Foundation's larger, independent dataset.
Methods: An external retrospective analysis using an independent dataset of 619 MM patients from the HealthTree Cure Hub national registry database validated the model's performance. Each patient's most recent laboratory results were used to validate the model. This dataset included de-identified clinical and laboratory parameters maintaining the structure used in our original model. The Random Forest model limited to the three most significant predictors was applied to the new dataset to predict M-spike values, and its performance was evaluated based on the coefficient of determination (R²) and RMSE between predicted and actual M-spike values. Statistical analyses were conducted on R statistical software with R Studio, using the same software environment as in the initial study to maintain consistency.
Results: The external validation showed a median M-spike value of 0.16 g/dL (range 0.01-2.5 g/dL). The model achieved an R² of 0.779 and a Root Mean Square Error (RMSE) of 0.2619, indicating substantial agreement between predicted and observed M-spike values. The model's accuracy was supported by a scatter plot visual comparison of predicted versus actual values across the range of M-spike measurements. The three predictors used on the simplified model, serum total protein and the first and second lagged M-spike values, maintained high importance, aligning with our initial findings and supporting the simplified model's accuracy on different MM patient cohorts.
Conclusion: Our initial research findings supported by HealthTree's external validation of our Random Forest model accuracy confirm its potential as a reliable tool for rapid M-spike prediction in MM patients. By providing faster predictions than traditional methods, this model can significantly accelerate clinical decision-making, enabling more timely interventions and reducing patient anxiety secondary to test results waiting periods. This model represents a significant advancement in the rapid assessment of disease burden in MM, facilitating prompt and personalized treatment decisions. Future research should focus on continuous refinement of the model by integrating more diverse datasets, predicting treatment outcomes under the IMGW criteria and exploring validation in clinical settings to further enhance its clinical utility.
Malek:janssen: Consultancy, Speakers Bureau; BMS: Consultancy; medpacto: Research Funding; Adaptive Bio: Consultancy. Ahlstrom:Pfizer: Other: Patient advocacy; BMS: Other: Patient advocacy; Janssen: Other: Patient advocacy; Takeda Oncology: Other: Patient advocacy; Sanofi: Other: Patient advocacy. Hydren:Sanofi: Research Funding; Takeda Oncology: Research Funding; BioLinRx: Research Funding; Johnson and Johnson Innovative Medicine: Research Funding; Regeneron: Research Funding; GlaxoSmithKline: Research Funding; Pfizer: Research Funding; Adaptive Biotechnologies: Research Funding.
This feature is available to Subscribers Only
Sign In or Create an Account Close Modal